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Abstract 


This paper presents an overview of different methods used in what is normally 
called Al-methods today. The methods have been there for many years, but now 
have built a platform of methods complementing each other and forming a cluster 
of tools to be used to build “learning systems”. Physical and statistical models are 
used together and complemented with data cleaning and sorting. Models are then 
used for many different applications like output prediction, soft sensors, fault 
detection, diagnostics, decision support, classifications, process optimization, 
model predictive control, maintenance on demand and production planning. In this 
chapter we try to give an overview of a number of methods, and how they can be 
utilized in process industry applications. 


Keywords: process industry, artificial intelligence (AI), learning system, 
soft sensors, machine learning 


1. Introduction 


During the 80th AI was a hot topic both in the academia and industries. Many 
researchers were working a lot with development of methods for diagnostics, sim- 
ulation and adaptation of models. Artificial Neural Networks (ANN) were being 
implemented in real applications such as e.g. soft sensors to predict NOx concen- 
tration in exhaust gas from power plants. Still there was quite some “over-selling” 
and the enthusiasm for AI in the future was assumed to be useful tomorrow. But it 
took much longer to get the systems robust enough to be used and fast enough to be 
applicable in on-line applications. After year 2000, systems started to reach a more 
mature state and we got IBMs Watson, that could beat the Jeopardy master. Later 
the Google tool could beat the “Go-master”, a very complex Chinese game. This has 
changed the perception of AI. It is still similar type of tools as were developed 
during the 80th, but now they were refined a lot and hardwires has been developed 
dramatically. This has given us a much more positive perception of what can be 
done, and a lot is now being implemented. Still there is a risk for over-selling, as the 
tools are normally not that “intelligent” as we normally think of when we talk about 
Intelligence. But we are closing the gap day by day. 

Concerning use of AI in process industry, we cannot just take the tools and hope 
they will fix everything. It is still important to identify “what is the problem to 
solve”? With Jeopardy the goal is to be good at Jeopardy, but what is the goal in 
process industry? It should be to increase production, reduce process variations, 
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implement maintenance on-demand and give operator support. It also means to 
coordinate and optimize production lines as well as complete plants and later on 
complete corporations. It also means to adapt to changing customer demands, 
support in development of new products with production lines as well as handle 
new business models. These different functions demand quite different tools and 
thus we will not use only one but several. Often Machine learning is considered 
being “the tool”, but often there is not data available to implement ML, especially 
not when starting a new production line. To implement new tools, it is also very 
important to pre-treat data. You have to sort data in “normal variations” or “anom- 
alies”. You may need to filter data with moving windows, but in different time 
perspectives. We need to do data reconciliation to handle drifting sensors. And you 
need to integrate all levels from orders to production planning down to coordinated 
and optimized production. In this chapter we will discuss a number of different 
methods as well as discuss integration between the different levels. Over the years 
many researchers have investigated different AI techniques for different process 
industrial application. A comprehensive review on different AI models applied in 
energy systems can be found in [1]. Applications of different AI tools based on 
simulation models in pulp and paper industry has been presented by researchers 
including Dahlquist [2-5]. Applications in power plants have been presented in many 
articles including Karlsson et al. [6-8]. In Karlsson et al. [9] a general discussion is 
made on how to make better use of data including pretreatment of data. Adaptation to 
degeneration in process models by time is discussed in Karlsson et al. [7]. [10] 
conducted an extensive review on different AI based soft sensors in process industries. 


1.1 Similarities between AI and how the brain works 


The mathematicians developing especially ANN have been looking a lot on how 
the brain works. In Figure 1 we see a principal picture of a human. 

Running in a forest: The brain stores many different factors locally by “tuning 
many soft sensors”. During the night strength of connections are enhanced for the 
most important functions, while other less important connections are eliminated. 
Some information is used for direct control. Others is stored for use later on. 

If it is rainy when you run there is a general feeling that “this was not so nice”. 
Everything else happening in the forest then will be “colored” by this in your mem- 
ory, aside of concrete thing like if you meet someone, like a friend, during the run. 


Figure 1. 
How a human handle input from the surrounding. 
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Short term memory: Dorsolateral prefrontal cortex controls information stream 
from sensors. Skull lobe is for attention. Ventrolateral prefrontal cortex sort infor- 
mation into useful or not useful info. Supplementary motor area (SMA) repeat new 
memories all over. 

Long term memory: Hippocampus and nearby areas in medial temple globe are 
essential for long term memory. Facts are stored. Small brain and basial ganglia 
contain procedural memory, like how to bike or swim. 

A human may have approximately 120 billion nerve cells. Each connect to 
hundreds of other cells. Some connections enhance while other decrease signals. 
Very complex interactions where connections are established and broken continu- 
ously. No exact values or memories exist for control, but diffuse input give diffuse 
output, but with different feed-back mechanisms. The Swedish Nobel Prize winner 
Arvid Carlsson [11] found out the mechanism of how signals are transferred from 
the dendrite of one cell to the axon of the next, where complex feed-back mecha- 
nisms enhance a connection and thereby also enforced a memory by changing the 
easiness of transferring new signals. He explored how dopamine works as a signal 
substance, which we now know is of highest importance in the brain. By back- 
propagation in ANN we try to simulate this mechanism (Figure 2). 

Input to the brain is sorted in Amygdala and hippocampus. Signals are sent to 
different part of the brain Here different signals are enhanced or decreased 
depending on previous experiences in many different “soft sensors”, built up with 
tuning of Ca-channels working as parameters in a polynom. “= enhancement fac- 
tors”. The situation is triggering memory build up. All control is “diffuse” using 
many different “diffuse” measurements. Different individuals have different sensi- 
tivity and number of different sensors like sense for bitterness, sugar, pain etc. Soft 
sensors get input and react with output to other soft sensors. Signals are sent to 
direct different biochemical processes like when fear - increase production of 
Adrenalin and Cortisone. This in turn is affecting many other hormones and pro- 
teins etc. Also, microbiome in the stomach and skin send input to the brain on how 
these organs perform. When you run, the body feel good and e.g. endorphins are 
produced enhancing performance of stomach, muscles etc. Serotonin levels, 


Signals to glands 
and organs 


Figure 2. 
Signals flow in the brain — Many connections and feed-back enhance learning. 
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gibberellins, insulin, cortisone etc. are interacting and tuning each other, but with 
influence “from the side” by other sensor inputs. The brain is interacting with all 
this. This is also the basic concept to mimic in “deep learning”. 

If we try to transfer this picture into a control system, it can look like below in 
Figure 3. 

We start with sorting out “outliers” in pre-processing. This is what the brain 
does with information from the eye etc. The outliers can be used for anomaly 
detection. This is principally what is done in Amygdala. We then compare pre- 
dictions from simulators and soft- sensors to measurements. We trend differences 
developed by time. Refined data are used for model building and adaptation of 
models. The models are used for soft sensors, diagnostics, control etc. We also make 
conclusions in decision a tree from previous experience and identify optimal action 
to take in different time perspectives. In the brain this is done by utilizing previous 
experience in a way where we try to “make sense”. This means that we replace 
missing data with what is reasonable. In our computer system we do this by data- 
reconciliation using e.g. solving an equation system of physical models to get a best 
fit. We then take actions by control of many different functions more. In the body, 
this means e.g. control of sugar content in the blood, release of adrenalin to meet 
threats or melatonin to make you tired and go to sleep. We learn buy tuning soft 
sensors and decision trees with the new information just as the brain does, but 
where the brain is very much more complex than what we can handle today. 


1.2 Market aspects 


IndTech’s market, i.e. Products and systems for industrial digitization and auto- 
mation in the world are worth around USD 340 billion in 2016/2017 and have an 
average growth rate of 7-8 percent. The area can be divided into two parts: IT 
(industrial IT) and OT (operational technology). The share that can be categorized 
into industrial IT is about USD 110-120 billion. The remaining USD 220 billion is 
operational technology for the factory floors and in the field. It, in turn, is tradi- 
tionally divided into discrete automation (about 45 percent) and process automa- 
tion (about 55%). OT includes various types of industrial control systems (ICS) and 
field equipment such as instrumentation, analysis, drive systems, motors, robots 
and similar. 

For the future of AI, we can see that this comes deeply into all these industrial 
market segments, but also far beyond as not only for industrial applications. 


Decision 
Pre- Classification : support 
rocessing ei of data Diagnostics 
i Optimization 
and control 


l 


Feedback and adaptation of soft sensors 


Figure 3. 
Principal diagram of signal processing in a “learning system”. 
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The tools thus will be developed for one application, but then will be used also for 
other applications most probable. 


2. Different AI methods 


There are many different methods developed. Some of them are very similar or 
aim to solve the same type of problems. If we look at Machine learning (ML), we 
have e.g. Regression. Artificial Neural Networks (ANN), Support Vector Machines 
(SVM), Principal Component Analysis (PCA), Partial Least Square regression 
(PLS) and etc. They both aim to sort different variables into group that correlate to 
different properties or faults. 

PLS and ANN, both are very useful to create soft sensors. Deep learning is a 
sophisticated version of the ANN, but with the goal to produce models that can do 
much more than just be a soft sensor, which predicts one or more qualities. Exam- 
ples of soft sensors is to predict strength properties of paper from e.g. NIR data and 
process variable values in paper machines, amount of different kind of plastics in 
Waste combustion plants or protein content in cereals in agriculture from NIR 
spectra. The deep learning on the other hand can be used to teach a robot to pick out 
machine components that are scrapped from a conveyor belt for instance. This then 
includes image pattern analysis from camera monitoring of the parts passing. 

A selection of different tools is listed in Table 1. 


2.1 Machine learning methods 


Machine learning methods principally use a lot of process data measured pref- 
erably on-line, and identify correlation models from the data, which can be used for 
different purposes like soft sensors, anomaly detection and others. 

There are several different machine learning methods. Some are correlating a 
specific property to process data. Reinforcement learning is described in e.g. 
Gattami Ather [12]. It is used in problems where actions (decisions) have to be made 
and each action (decision) affects future states of the system. Success is measured 
by a scalar reward signal and proceed to maximize reward (or minimize cost) where 
no system model is available. One example of this technique is deep reinforcement 
learning which was used in AlphaGo that defeated the World Champion in Go. Here 
a Q function is approximated with a deep neural network. Minimizing the loss 
function with respect to the neural network weights w is made as given below 


l = (r(s,a) + dsupQ (5,4, w_) — Q (s, a, w))? (1) 


e Gaussian Process Regression (GPR) 

e Partial Least Square (PLS) Regression 
e Principal Component Analysis (PCA) 
e Artificial Neural Networks (ANN) 

e Support Vector Machines (SVM) 

e Gray box models 

e Physical models, MPC - model predictive control 
e Bayesian networks (BN) 

e Gaussian Mixture Model (GMM) 

e Reinforcement Learning 

e Google algorithm — search engines 


Table 1. 
A selection of different common AI-tools. 
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If the system is deterministic the model is given by 
Sri = fr (Skak) (2) 


If the system is stochastic the model is given by 


P(Spi1\Sks ak) (3) 


f,(Sk» ar) is a scalar valued reward. 
In Werbos Paul: A Menu of Design for reinforcement learning over time [13] 
reinforcement methods are described more generally. 


2.2 Soft sensors 


It is interesting to create soft sensors by creating models correlating process 
measurements on-line to quality measurements from samples analyzed at lab. The 
soft sensor then can be used to predict the quality property on-line from feeding the 
on-line measurements into the soft sensor model. There are several different 
methods for the regression, and a number of alternatives are given in Figure 4 
below. 

In Figure 5 we see how the data flow can look like for data collection, data 
pre-processing, model building and model validation. Here NIR measurements are 
correlated to properties like lignin content. 

Soft sensors also can be built with other methods like using ANN, Artificial 
Neural nets. There are advantages and disadvantages with the different methods, 
but also commonalities. You need good data for building the models. This means 
that data need to be spread out in the value space in a good way. If we only have 
“white noise” the models will be unusable. We need to vary all variables in a 
systematic way to get useful data for model building. 


2.3 Gaussian process regression model 


Gaussian Process Regression takes more memory but gives better regression 
models than many other methods like (Nonlinear) System Identification, Neural 
Networks and Adaptive learning models. Can also be Combine with physics-based 
models. The method is presented in e.g. Fredrik et al. [14]. In Figure 6 we see a first 
attempt to predict kappa number of pulps after a digester for two different wood 


Chemometric 


techniques 


Mathematical pre-processing Qualitative Quantitative 
and pre-treatments methods methods 
methods derivatives 

Principal Linear: Distance-based Multiple linear Artificial neural 
Standard normal Savitzky-Golay component methods, Linear discriminant regression (MLR), network (ANN), 
variate (SNV), Spectral analysis (PCA), analysis (LDA), Regularized Principal component Support vector 
Multiplicative scatter derivatives. Clustering discriminant analysis (RDA), k- regression (PCR). machines (SVM) 
correction (MSC), Polynomial methods (CA) nearest neighbours (KNN), Partial least-squares and non-linear 

Orthogonal signal derivative filter Soft independent modelling regression (PLS_R) partial least- 
correction (OSC) of class analogy (SIMCA), squares (N-PLS) 


Partial least-squares 
discriminant analysis (PLS-DA) 
Non-linear: Artificial neural 
network (ANN) and Support 
vector machines (SVM) 


Figure 4. 
A number of methods that can be used to develop soft sensor models from process data. 
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Figure 5. 
Data flow for building and verification of soft sensors. 


Kappa: Blue = measured, Red = predicted 
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Example of Gaussian process regression (GPR) for kappa prediction. 


types, hardwood and soft wood. The training data fits quite well, while the pre- 
dictions are less good. By using more data and fine-tune the estimation of residence 
time in the reactor the prediction power became significantly better. It went from 


R? = 54 to R? > 90. 


2.4 Artificial neural nets, ANN 


Artificial neural nets try to mimic the brain. In a simple way we can use the 
equation below to show how it is calculated: 
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(|) = a1 * (71 + Built) + Pup(t)) + a*r + Bogill) + Br@arlt)) (4 


In Figure 7 we see three input variables to the left. Each variable is multiplied 
with a weight factor towards the two summa-nodes, where the products are sum- 
marized. Next these values can be treated to pass a threshold or only be passed on 
and multiplied with a second constant ai. The two products are summarized again, 
and we get a prediction of the value of a wanted property. When you build the net, 
you look at the difference between the measured and the predicted value and adjust 
the weight factors until you get a good fit. When you have been testing one set of 
input variables you go to the next and proceed for all data you have and try to get a 
fit that is the best for all input variables together. This is a simple net with only one 
“hidden layer”, but you can have much more complex versions with many variables 
and many layers. If you have many layers the problem though can be that you get a 
good fit for the training data but it may also give risk for “over-fitting”, which 
means less stable predictions. 

An example of a first commercial application of ANN was for prediction of NOx 
in power plants. In Figure 8 below we see a regression for the power boiler number 
four in Vasteras. 


2.5 PLS, partial least square regression and factorial design of experiments 


PLS is very popular to use for making prediction models after performing facto- 
rial designs of experiments. The basic idea is to start with a linear regression for a 
line, y = a + b * x, and adding non-linearity by +c * x2 and if there are more than 
one variable the interaction between variable 1 and 2 by d * x1 * x2. The polynomial 
for a property like a strength property of a paper then becomes 


=A+Bxx,+Cx*x2+D *x + Exx + F xxix (5) 
y 


Here A-F are constants you get from fitting the experimental data to the model. 
If we use factorial design, it means that we try to expand the prediction space as 
much as possible within given borders. This means that we shall have a good 
distribution of experimental data in all parts of the space, and not only close to origo 
or in one part of the space. This means for example that you shall not make 
correlation for one variable at a time but vary all variables in a systematic way. In 
Ferreira et al. [15] the Box—Behnken design is described more in detail. In Table 2 
below we see an example for three variables: 


Ova 
om 


@— sc 


Figure 7. 
A simple artificial neural net, ANN. 
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Predicted vs. Actual 


NOX (0) 


0.278 Pae 
0.278 0.400 0.600 0.800 
Actual 


Figure 8. 
A plot showing the correlation between prediction with an ANN and measurements of the actual NOx content 
in the exhaust gases from a power plant (coal fired boiler 4 at Malarenergi). 


Experiment no X1 X2 x3 
al + + + 
2 + + — 
3 + — + 
4 + — — 
5 — + + 
6 — + — 
7 — — + 
8 = = = 
9 0 0 0 
10 V3 0 0 
11 0 V3 0 
12 0 0 0v3 
Table 2. 


Factorial design of experiments with three important variables to predict a certain qualitative variable like 
paper property, lignin content, content of different plastics etc. 


The first 8 experiments give the linear regression while the last four gives the 
non-linear components. As we vary all variables independently, we get the interac- 
tion between the variables directly. (+) means here a higher amount or 
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concentration of the variable while (—) means a low. (0) is Origo and v3 is where a 
sphere is cutting the axis. 

It is important to have an equal distribution in the whole sample volume of 
measurements. If a high concentration of samples around origo — the impact of the 
“real” samples will be too small. It is better to have a few good samples well distrib- 
uted instead of many around origo or some other part of the space. By varying several 
variables at simultaneous also catches interactions between the variables. The reason 
while sometimes models built from only on-line data in a plant may have very little 
prediction power is if we have a number of important variables with controllers, and 
only get the white noise due to poor control. By really varying these variables in a 
systematic way as proposed by factorial design, we can build robust prediction 
models. If the models still are not that good, it may be because we are not varying or 
measuring all important variables. Then we should change the variables in the facto- 
rial design. If you do not know which variables are the most important you can start 
with the factorial design scheme in Table 2 but add more variables and just vary 
them around origo and perhaps some other random point. From this first scan we can 
decide which variable to focus more experiments on. 

The factorial design scheme can also be seen as values at the corners of a cube 
and where the axis crosses a sphere around the cube as seen in Figure 9 below: 

If it is expensive to run all experiments, you can make a reduced factorial design, 
where you principally pick some of the variants randomly and make a PLS model. 
You then add one or two experiments and see how much better it becomes and 
proceed until you feel satisfied. This can be illustrated as in Figure 10. 

Principally the regression is made so that you start with a line through all data in 
the space and calculate the square of the distance between the point and the line. 
You add all values for all points. Then you change the direction and make a new try. 
This then proceeds until you have found a line that has least sum of square errors. 
You then make an axis perpendicular to this first line and proceed to find a plane. 

One example can be seen in Figure 11. 


Strength = A + B * concentration of filler + C « ration_longfiber_to_shortfiber 
+ D « (concentration_of_filler)2 + E + (ration_longfiber_to_shortfiber)2 
+ F + (ration_longfiber_to_shortfiber x (concentration_of_filler) 


(6) 


Figure 9. 
Factorial design with values in all corners of the cube and where axis cross a sphere surrounding the cube. 
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Figure 10. 
Reduced factorial design. 


Figure 11. 
The plane direction is corresponding to the line, the down wards bending the non-linearity and the cross bending 
of the surface shows interaction between the different variables x, x, and xz. 


In Figure 12 we see what wavelengths have importance and to what degree for 
predicting the investigated property. At the top we have regression coefficients for 
AIL, Acid Insoluble Lignin, and at the bottom for ASL, Acid Soluble Lignin. 

We can see from the regression coefficients in Figure 12 that there is a signifi- 
cant difference between the spectra, indicating that the chemistry differs quite a lot. 
This as each wavelength corresponds to vibrations of a certain chemical bonding, 
like C-H, C-H2, C-O, C=O, etc. This example is taken from Skvaril Jan [16]. 

Confounding means that some effects cannot be studied independently of each 
other. This is very much the case in combustion processes, water treatment, process 
industries like pulp and paper etc.! This is why the factorial design of experiments 
make so much sense. In some cases, though there is no interaction between differ- 
ent variables, and then it might be OK to build linear models, but this is often more 
exceptions than the rule. There are a number of PLS methods. One popular version 
is PLS Regression which is presented by e.g. Svante et al. [17]. 


2.6 Fault diagnostics 


It is interesting to determine both process and sensor faults. This can be 
performed in many different ways. You can listen to noise from an engine that 
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Figure 12. 
Example of regression between wave lengths and lignin concentration in wood. 


indicates some fault. Or you measure that the temperature has become too high 
somewhere. Fault detection can be systemized by using different tools and BN, 
Bayesian Networks, is a tool suitable for identifying causality relations and 
probability for different type of faults simultaneously. 


2.6.1 Bayesian networks (BN) 


Bayes was a priest in Scotland first discussing correlation versus causality. Cor- 
relation means that you can see how different variable are connected to each other, 
while causality means to take it a step further and also identify true dependence 
between a variable and a fault or similar. If we see that there is a correlation 
between homeopathic levels of a substance and effect on health, this can be a 
correlation but hardly that the homeopathic medicine is causing the good health. A 
lot of correlations are just random! With the Bayesian net you try to find the 
causality between different variables and a fault or similar and also quantify this. If 
we have a lot of experimental data we can use this to tune the BN, but if we do not 
have it but know from experience that there is a causality, we can make a reasonable 
guess of the importance in relation to other variables and use this for the BN. This 
gives an opportunity to make prediction models without “big data” and you can 
combine this input with real measurements in the plant. 

Applications of BN for condition monitoring, root cause analysis (RCA) and 
decision support has been presented in e.g. Weidl G.,Madsen A L, Dahlquist E [18]; 
[19, 20] and adaptive RCA in Weidl et al. [21]. Weidl and Dahlquist [22] also has 
given a number of examples of RCA in pulp and paper industry applications like 
digesters and screens. In Weidl and Dahlquist [23] applications more generally for 
complex process operations are presented where object-oriented BN are utilized. 
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In Widarsson [24] Bayesian Network for Decision Support on Soot Blowing Super- 
heaters in a Biomass Fuelled Boiler was presented and in. 

If we have a number of BN variables U = {A;} and parent variables pa(Ai) of Ai we 
can use the chain rule for Bayesian networks to give the probability for all variables Ai 
as the product of all conditional probability tables (CTP) P(S,1JH,Hy, ... Ha). Here Sk 
is the child node which can be observed status, measured values by some meter, a 
trend or similar) and H; is the parent node (assumed causes or conditions causing a 
change in the child node state). The CPT can be trained by real measurements with 
conditions and related failures or created by using experience by operators or process 
experts. This is of specific interest when you want to include possible faults occurring 
very seldom, but severe when actually happening. Data might also be created for 
training by running a simulator with physical models and with different faults. 

The chain rule for all CTPs is as seen in Eq. 7. 


P(U) = P(Au, «..-,An) = | [,P(Ailpa(Ai)) (7) 


An example of a BN for a Root Cause Analysis function for a screen in e.g. pulp 
and paper industry can be seen in Figures 13 and 14. 


2.6.2 Anomaly detection 

If we have identified that a variable should be within certain limits or we have 
made a model using SVM or PCA or similar, we can see if the measured set of 
variables is within the boarders for a class or group. Both these types of measures can 


be used for anomaly detection. This can be very useful to identify if the process goes 
out of normal operations even if you have not passed the limits for a single variable. 


2.7 Classification and clustering 
2.7.1 Principal component analysis (PCA) 


Svante et al. [25] have presented the tool PCA in an article already 1987. PCA is 
often in the same software package as PLS but has a different use. In the PCA we 


Possible root causes =a 
E Pressure Ce | 
E sensors Reject conc 
a \ = 
Valve before | m | 
= Reet a = 
\ Reject valve \ - - 
\\ Reject Screen a | é 
\ l tows i 
| - opening j ‘ - 


Evidence signal layer. Ri 


Pressure F ee SS <n =a ee 
inject | 4 = SS UY Oe \\ 
3 Reject flow 
| e = 


Screen 
performance 


Normal/Failure 


Figure 13. 
A Bayesian model for RCA (root cause analysis) of a screen. 
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Figure 14. 
A principal drawing of a screen with sensors. 


plot all measured data onto different planes to see how the variables distribute in 
the plane. From this we can see that variables close to each other are affecting a 
certain property in the same way, while those on the opposite side of the diagram 
are having also the opposite effect. If the variables are close to Origo, we can believe 
they have not much effect at all on the studied property. 

The score vector is a column of T. There will be one score vector for each single 
PC. Each experiment will have one value in the PC1 and one in the PC2. You plot all 
experiments in a coordinate system with PC1 and PC2. If we plot all experiments in 
a diagram with PC1 and PC2 we can get as in Figure 15. 

In Figure 15 we have plotted the time series of measurements and can see that 
there is a development from left to right as time passes by, along PC1. This shows 
that something is happening by time. We can also make a loading (p) plot. The 
loading plot shows how much each variable contributes to each PC. Each PC can be 
seen as the linear combination of the original variables 


PC; = > iki (8) 


The loadings are the coefficients pj. Each variable can contribute to more than 
one PC. If we have more than two PCs, it can contribute to all PCs. In Figure 16 we 
see the p-plot for a number of variables: 
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y PC2Z 


N 


7 10 12 


9 1 PC1 


Figure 15. 
Score plot (t). First sample no 1 at t = o and following no:s following time steps. 


PC2 


Figure 16. 
P-plot for eight variables in the PC1 — PC2 coordinate system. 


From Figure 16 we can see that X3 and X6 have small impact while X4 and X8 
have stronger impact but reverse to each other. X1 and X2 are following each other 
closely. 
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In Figure 17 it can be seen that when the set of variables is within the circle the 
process is running OK, but when going outside you should take a look and try to get 
it inside the circle again. This is a bit towards diffuse control, like in the human 
body. 

You can use the p-plot also to classify a number of faults. In Figure 18 we see an 
example where vibrations, temperatures and electric power consumption was used 
to predict different type of faults. The faults were implemented at the lab and the 
variables measured. From this we could see that the variables were forming 
different patterns. 

The PLS is principally partial least squares or projection to latent structures. 
Principally you do an interactive PCA for both X and Y matrices. 


X=X T*PT+E (9) 
Y=) U*QT+F (10) 
PC2 


PC1 
Figure 17. 
Using the plot to control the process by keeping within a certain area of the PC1-PC2 space. 
PC2 
Friction in bearing 
Imbalance 
Over heated 
> 
PC1 


, Loose bolt 
Stress on connection 


Figure 18. 
Use of plots to classify different faults. 
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Figure 19. 
The principles for PLS (partial least square) regression. 


This can schematically be seen in Figure 19 below. 

U gives starting values for T, and T back to U iteratively. Interdependency. 
When the difference between two iterations is below a certain value, we take this as 
the result. 

There are a number of versions of this. PLS2 general = all Y; PLS1 for each single 
Y; PCR also for each single Y, and no interactivity between Y and X (first X, then 
Y); PCR is often used by statisticians while PLS by application engineers normally. 

The result from the PLS regression will be a ploynom. If only linear: Y; = 
A + Bx; + Cx, . If also nonlinear: Y4 = A+B*X1+C*X)+D+#X;+E*X>’. If 
also interaction between variables: Y4 = A + BX; + CX) + DX)? + EX? + FX1X). If 
we have more variables than two, we add X3, X3? and interaction between X; and 
the other variables, etc. These are used for prediction of Y4. If you want to study 
several quality aspects using the same experiments, you add polynoms for Y2, Y3, Y; 
in the same way, but with different constants of course. 


2.7.2 Support vector machines (SVM) 


In SVM we try to find the balancing point for different clusters and then try to 
distribute the different measured values as close as possible to one of these cluster 
balancing points. This is giving a similar type of clustering but is usually used for a 
big set of data where you want to find out how many clusters there might be. You 
can systematically test to have more or less clusters and see how the data fits from a 
statistical perspective into more or less clusters. 


2.8 Adaptive control 


In Narend S. Kumpati [26] Adaptive control using neural networks is presented. 
Since then MPC, both “fixed” and adaptive, have come to use in many applications 
in process industry. There is even a Journal of Adaptive Control and Signal 
Processing. In a recent number, April 2020, Merve et al. [27] discuss Improving 
transient performance of discrete-time model reference adaptive control architec- 
tures. This area is binding AI, modeling and control together. 
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3. Architectural structure 


In Figure 20 the structure implemented in the FUDIPO project (www.fudipo.eu) 
[28] with respect to different functions is outlined. In the chapter about the data 
structure Tieto has addressed different programs. These two are complementary. One 
is a set of commercial software linked into the Tieto HMI3 platform. Examples of the 
commercial tools are MatLab/Simulink for mathematical calculations and simulation, 
Hugin for Bayesian Network configuration and Dymola for Modelica implementation 
for simultaneous solver simulation. 

In the second structure we have primarily open source programs like Red Node for 
configuration of the complete system, linking everything together. MatLab is replaced 
by Python and Simulink with OpenModelica, Dymola and these are then complemented 
by other simpler software for different functions. The idea is that you can test all 
functionalities together in the open source environment. If you have a smaller system 
you can configure and use this also for “the real case”. If you have a bigger system 
demand you probably go for commercial software to also get support for the functions, 
and perhaps also make a service contract with someone who can support sustaining the 
system, and upgrading on a frequent basis, as the production plant is developing. 

From this overview we can see that there are many possibilities with use of Al- 
tools, but it also takes some effort to understand which tools are useful to solve 
specific problems. 


e The solutions must be robust. 100% of operational space must be covered in a 
reasonable way. 


e Diagnostics must detect real faults, but avoid detect “false faults” 


e Autonomous systems may be good, but you have to identify the boarders and 
limits and what are important functions to work with. 


Order plan Sales sincame Costs - expenses 


Optimization 


Production plan 


Modified Production plan 


Risk of failure Maintenance OD 


Decision s rt 
Machine learning = 
Deviation (sim- meas) "a a 
Fault diagnostics sensors 


Statistical models Model based 
> Model adaptation control 
Physical Models xw 


Data pre-processing 


= = tll i iio (aie ae Nel 
JFR A = hahaa di 
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Micro-CHP Fleet z É á 
WWTP Refineries Pulp and paper industries Power plants 


Figure 20. 
Layout of a complete system where different level and functions are connected and integrated. 
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e Need to define the problem to solve! 

e Optimization and adaptive systems and functions should include all important 
functions. To do so you also need to vary the important variables. You cannot 
train a system on constant values! Factorial design of “experiments” is then 
important. 

e Many new tools are being accessible, but you need to understand how they 
work! Do not guess. 
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